Feature Selection and Ensemble Learning Techniques in One-Class Classifiers: An Empirical Study of Two-Class Imbalanced Datasets
نویسندگان
چکیده
Class imbalance learning is an important research problem in data mining and machine learning. Most solutions including levels, algorithm cost sensitive approaches are derived using multi-class classifiers, depending on the number of classes to be classified. One-class classification (OCC) techniques, contrast, have been widely used for anomaly or outlier detection where only normal positive class training available. In this study, we treat every two-class imbalanced dataset as problem, which contains a larger majority class, i.e. very small minority class. The objectives paper understand performance OCC classifiers examine level improvement when feature selection considered pre-processing ensemble employed combine multiple classifiers. Based 55 datasets with different ranges ratios one-class support vector machine, isolation forest, local factor representative found that good at high ratio datasets, outperforming C4.5 baseline. most cases, though, performing does not improve most. However, many homogeneous heterogeneous classifier ensembles do outperform single some specific combinations both without selection, similar better than baseline combination SMOTE C4.5.
منابع مشابه
A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets
Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this resea...
متن کاملA Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets
and Applied Analysis 3 costs for the positive and negative classes, SVM can be extended to the cost-sensitive setting by introducing an additional parameter that penalizes the errors asymmetrically. Consider that we have a binary classification problem, which is represented by a data set {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x l , y l )}, where x i ⊂ R represents a k-dimensional data point and ...
متن کاملClass-based Aggressive Feature Selection for Polynomial Networks Text Classifiers – an Empirical Study
Feature Selection (FS) is a crucial preprocessing step in Text Classification (TC) systems. FS can be either Class-Based or Corpus-Based. Polynomial Network (PN) classifiers have proved recently to be competitive in TC using a very small subset of corpora features. This paper presents an empirical study of the performance of PN classifiers using Aggressive Class-Based FS. Seven of the stateof-t...
متن کاملFeature Selection using Distributed Ensemble Classifiers for Very Large Datasets
Datasets are becoming larger and there is an acute need to use data mining techniques to exploit the available data. The increasing size of the datasets poses a challenge to the data miners, which can be solved using two approaches – high speed computing systems, and pre-processing techniques. In this paper, we propose a solution combining the above two techniques using a distributed feature se...
متن کاملRobustness of learning techniques in handling class noise in imbalanced datasets
Many real world datasets exhibit skewed class distributions in which almost all instances are allotted to a class and far fewer instances to a smaller, but more interesting class. A classifier induced from an imbalanced dataset has a low error rate for the majority class and an undesirable error rate for the minority class. Many research efforts have been made to deal with class noise but none ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2021.3051969